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ABSTRACT 

We present a novel method to significantly speed up cosmological parameter sampling. 
The method relies on constructing an interpolation of the CMB -log-likelihood based on sparse 
grids, which is used as a shortcut for the likelihood-evaluation. We obtain excellent results 
over a large region in parameter space, comprising about 25 log-likelihoods around the peak, 
and we reproduce the one-dimensional projections of the likelihood almost perfectly. In speed 
and accuracy, our technique is competitive to existing approaches to accelerate parameter esti- 
mation based on polynomial interpolation or neural networks, while having some advantages 
over them. In our method, there is no danger of creating unphysical wiggles as it can be the 
case for polynomial fits of a high degree. Furthermore, we do not require a long training time 
as for neural networks, but the construction of the interpolation is determined by the time it 
takes to evaluate the likelihood at the sampling points, which can be parallelised to an arbitrary 
degree. Our approach is completely general, and it can adaptively exploit the properties of the 
underlying function. We can thus apply it to any problem where an accurate interpolation of 
a function is needed. 
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1 INTRODUCTION 

The main two bottlenecks in cosmological parameter estima- 
. tion using the power spectrum of the cosmic microwave back- 
' ground (CMB) are the calculation of the th eoretical C; -spectrum 
, using Boltzma nn codes such as CMBFAST /Seliak & Zaldarr iagal 
. Il99d ). CAMB ( iLewis et al]|200d) . or CMBEASY (Doran 2005) and 
' the evaluation of the likelihood using the official WMAP likeli- 
, hood codfl There exist sev eral methods to speed up the calcula- 

■ tion of the power spectru m jjimenez et alj|2004l : iKaplinghat et al.l 

■ 12002"; 'Habib et al.' '2007^ or the W MAP likelihood function £ 
(Sandviketal. 2004; Fe ndt & Wan delt 2007; Auld et al. 2008). 
These methods are based on different techniques, such as analytic 
approximations, polynomial fits, and neural networks, which are 
all trained using a set of training points, for which the real power 
spectra and likelihood values have to be calculated. Once the codes 
are trained for a particular cosmological model, they can be used to 
evaluate the power spectrum or the likelihood function in every sub- 
sequent parameter estimation run, which significantly speeds up the 
Markov Chain Monte Carlo (MCMC) simulations used for param- 
eter estimation. Due to the ever-growing amount of available data, 
a fast evaluation of the likelihood is becoming of increasing impor- 
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tance, especially when combining CMB data with data-sets whose 
likeli hood is less e xpensive to evaluate. The Planck Surveyor mis- 
sion ( lTaubeill2OO0l) will be the upcoming challenge in this respect. 

In this work, we approximate the WMAP log-likelihood func- 
tion In C in the spirit o f CMBfit CSandvik et al. 2004) and Pico 
jpendt & Wandelll 2007h, wh ich work with polynomial fits, and 
CosmoNet i Auld et al. |2008 |). an approach based on neural net- 



works. In contrast to the fitting functions constructed therein, we 
introduce the technique of sparse grids in this context to construct 
an interpolation of In £, returning the exact function values at a set 
of sampling points. 

Most straightforward interpolation techniques are based on 
sets of sampling points in each dimension, typically based on (uni- 
form) grid structures — consider, e. g., piecewise d-linear or piece- 
wise d-polynomial interpolation schemes. Unfortunately, grid- 
based methods are only feasible in low-dimensional settings, as 
they suffer from the so-called curse of dimensionality: Spending 
iV function evaluations or grid points in one dimension leads to N'' 
grid points in d dimensions. The exponential dependency on the di- 
mensionality imposes severe restrictions on the number of dimen- 
sions that can be handled. Sparse grids, as introduced by Zenge^ 
( Il99lh . allow to overcome the curse of dimensionality to some ex- 
tent, at least for sufficiently smooth functions as it is the case in our 
setting. Sparse grid interpolation is based on an a priori selection of 
grid points, requiring significantly fewer grid points than conven- 
tional interpolation on a full grid, while preserving the asymptotic 
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error decay of a full-grid interpolation with increasing grid resolu- 
tion up to a logarithmic factor. This permits us to compute higher- 
dimensional interpolations and approximations than before. A very 
good overview about sparse grids, discus sing general properties, 
can be found in lSungartz & Griebell j2004) . 

The sparse grid technique is a completely general approach, 
not tailored to a single application, and can therefore be used to 
interpolate any function which is sufficiently smooth. Additionally, 
as it allows for arbitrary adaptive refinement schemes, the general, 
fast convergence rates can be improved even further, by adapting to 
the special characteristics of the underlying target function. 

We obtain excellent results, wh ich are competit ive to 

fitting procedures using polynomials dFendt & Wandelll l2007l : 
ISandvik et all2004h or neural networks jAuld et al.ll2008h in speed 
and accuracy. Furthermore, we believe that the interpolation based 
on sparse grids has several advantages over these approaches. First 
of all, we can use the resul ts of sparse grid approximation quality 
( iBungartz & Griebell [2004h . guaranteeing the convergence of the 
interpolation towards the original function with increasing number 
of grid points. 

Second, once we have chosen the volume in which we want to 
interpolate the function in question, the sparse grid structure itself 
determines a priori the location of potential sampling points (which 
can additionally be refined in an adaptive manner a posteriori). This 
makes it unnecessary to assemble a set of training points before- 
hand (e.g. by running MCMCs as it is done by Fendt & Wa ndeij 
1 I2OO7I) ). The generation of the sampling points and the construc- 
tion of the interpolant can be strongly parallelised, which makes 
the sparse grid approach an ideal candidate for computational grid 
projects such as the AstroGritfl. The time needed to construct the 
interpolant is determined almost only by the time it takes to evalu- 
ate the likelihood at the sampl ing points. We do not need additional 
training time as in the case ofl Auld et alj ( l2008h . 

Furthermore, polynomial fits to a set of training points run the 
risk of creating unphysical wiggles if the polynomial degree of the 
fitting function is chosen too high with respect to the amount of 
training points available. Using the sparse grids approach, piece- 
wise polynomials of low degree are sufficient, as we are not fitting 
certain evaluation points, but rather interpolating a function. In- 
creasing the accuracy is therefore equivalent to evaluating at more 
sampling points. 

Sparse grids are based on a hierarchical formulation of the 
underlying basis functions, which can be used to obtain a generic 
estimate of the current approximation error while evaluating more 
and more sampling points. This can be directly used as a criterion 
for adaptive refinement as well as to stop further refinement. 

Another advantage is that the projection of sparse grid interpo- 
lations can be done in a very fast and simple way. This would make 
sparse grids in principle a good candidate for sampling posteriors 
and projecting them directly, without having to use a Markov Chain 
approach in order to marginalise the posterior. Given that MCMCs 
need to determine the points sequentially and can therefore not be 
parallelised (apart from running several chains at the same time), 
it would be highly desirable to find alternatives that can be run in 
parallel. 

We have attempted to use sparse grids in order to substi- 
tute the MCMCs in cosmological parameter estimations. In or- 
der to directly project the posterior distribution we would need 
to sample the posterior rather than its logarithm. Since, in gen- 
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eral, the logarithm of a probability density f unction is considerabl y 
more well-behaved than the function itself, San dvik et al.l ( |2004|) . 
Ifendt & Wandelt (200'3), and^uld et al. (2008) all operate in log- 
space to speed up the generation of MCMCs instead. As the conver- 
gent phase of the interpolation with sparse grids sets in rather late 
when interpolating Gaussian functions (and thus the WMAP like- 
lihood, which is close to a d-dimensional Gaussian), we restricted 
ourselves to the log-likelihood and thus to accelerating MCMCs, 
too. 

The article is organised as follows. First, we describe the ba- 
sics of sparse grids in Sec.|2] introducing a modification of the stan- 
dard sparse grid approach, thus adapting the latter to our problem. 
In Sec. |3] we then present the interpolation of the WMAP like- 
lihood for two different sets of parameters in both six and seven 
dimensions. We show that the results obtained for regular (non- 
adaptive) sparse grids are already competitive to other approaches 
and demonstrate how adaptive refinement can further improve the 
results. Sec.|4]finally concludes our work. 



2 BASICS OF SPARSE GRIDS 

Standard grid-based approaches of interpolating a functi on / ex- 
hibit t he curse of dimensionality, a term going back to ISellmanl 
( Il96ll) : Any straightforward discretisation scheme which employs 
N grid points (or, equivalently, degrees of freedom) in one dimen- 
sion leads to iV'' grid points in d dimensions. For reasonable iV, 
the exponential dependency on the number of dimensions typically 
does not allow to handle more than four-dimensional problems. 

Sparse grids are able to overcome this hurdle to some ex- 
tent, requiring significantly fewer grid points than a full grid, 
while preserving the asymptotic error decay of full grid inter- 
polation with increasing grid resolution up to a logarithmic fac- 
tor. Sparse grids have originally been develope d for the solution 
of partial differential equations l lZengei|[T99l|)_ and have mean- 
while been applied to various problems, see Bungartz & Griebell 
(2004) and the references cited therein. Recent work on sparse 
grids includes stochastic and non-stochastic partial differential 
equations in various settings (v on Petersdorff & Schwab 20q3; 
iGanapathvsubramanian & ZabarasI 12007; Widmeretal. 2008), as 
well as ap plications in economi cs (Reis inger & Wittum 2007| ; 
lHoltzll2008l) . r egression dCarcke & Hegla ndl l2009l: Garckel [ioO^ . 
classification ("Bungartz et al.' '2 0081 : iGarcke et al.l |20o'i[k fuzzy 
modelling (Klimke et al. 2006), and more. Note that (non-adaptive) 
sparse grids are closely related to the technique of hyperbolic 
crosses ( lTemIvakovlll993l) . 

In this section, we provide a brief overview of sparse grids 
for interpolation. For a detailed derivation of the characteristics of 
sparse grids, we refer to Bungartz & Griebel (2004). We start by 
formulating the interpolation on a conventional full grid using hi- 
erarchical basis functions, from which we then derive the interpo- 
lation on a sparse grid by omitting the basis functions contributing 
least to the interpolation. 



2.1 General idea of interpolation on a full grid 

We consider the piecewise d-linear interpolation of a function 
/ : — >■ R which is given only algorithmically, i.e., we have no 
closed form of / but we can only evaluate / at arbitrary points using 
a numerical code. As we want to discretise our domain of interest 
f2, we restrict SI to a compact sub-volume of R''; here, Q. = [0, 1]'', 
the d-dimensional unit-hypercube. (For the standard approach of 
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Figure 1. One-dimensional piecewise linear interpolation u{x), dashed, of 
a function /(x), solid, (left) by a linear combination of hat basis functions 
(right). 

sparse grids techniques, we only consider functions that are zero 
on the boundary of the volume on which they are defined. This as- 
sumption will be dropped when we come to the interpolation of the 
log-likelihood of WMAP.) 

To construct an interpolant u of /, we discretise Q. via a reg- 
ular grid, obtaining equidistant grid points Xi, with mesh width 
/i„ = 2~" for some discretisation or refinement level n, at which 
we evaluate and interpolate /. If we define a suitable set of piece- 
wise d-linear basis functions Lpi{x), we can obtain u{x) from the 
space of continuous, piecewise d-linear functions V„ by combining 
them adequately as a weighted sum of basis functions, i.e. 

f{x) ^ u{x) = ^aiip^{x) 

i 

with coefficients cti. Figure [T] sketches the idea for a one- 
dimensional example, using the standard nodal basis. 

The curse of dimensionality, encountered when using a full 
grid, can be circumvented by a suitable choice of basis functions: 
We need a basis where the relevant information is represented by 
as few basis functions as possible. Most basis functions can then be 
omitted as they contribute only little to the interpolation of /, re- 
ducing a full grid to a sparse grid and allowing us to handle higher- 
dimensional functions than before. A suitable basis can be found by 
a hierarchical construction as introduced in the following section. 

2.2 Hierarchical basis functions in one dimension 

Sparse grids depend on a hierarchical decomposition of the under- 
lying approximation spaces. Therefore, and first considering only 
the one-dimensional case which we will later extend to d dimen- 
sions, we use the standard hat function, 

(^(x) = max(l — |a::j, 0) , (1) 

from which we derive one-dimensional hat basis functions by di- 
latation and translation, 

Lpi^^{x) = ip{2'-x ~ i) , (2) 

which depend on a level / and an index i, < i < 2' . The basis 
functions have local support and are centred at grid points = 
2~'i, at which we will interpolate /. Introducing the hierarchical 
index sets 

7; = 1 1 G N : 1 < i < 2' - 1, i odd I , (3) 
we obtain a set of hierarchical subspaces Wi, 

Wi = span {v5i,i(a;) : i £ h} ■ (4) 




Figure 2. One-dimensional basis functions ipi i and corresponding grid 
points xi i up to level n = 3 in the hierarchical basis (left) and the common 
nodal point basis (light). 

We can then formulate the space of piecewise linear functions Vn 
on a full grid with mesh width hn for a given level n as a direct 
sum of Wi, 

K=0H^i, (5) 

see Figure |2] Note that all basis functions of the same subspace 
Wi have the same size, shape, and compact support, that their sup- 
ports are non-overlapping, and that together they cover the whole 
domain. 

The interpolation u{x) G Vn can then be written as a finite 

sum, 

u{x) = ^ aLi^pi,i{x) , (6) 

where the so-called (hierarchical) surpluses ai^i are uniquely in- 
dexed by the same level and index as the corresponding basis func- 
tions. 

2.3 Higher-dimensional interpolation on a full grid 

The basis functions are extended to the d-dimensional case via a 
tensor product approach, 

d 

<fii,i{x) = Y[ipi.,i^{xj) , (7) 

J=l 

with the d-dimensional multi-indices I and i indicating level and 
index for each dimension. The other one-dimensional notations can 
be transferred to the arbitrary-dimensional case as well, consider, 
e.g. , the index set 

/( = I i : 1 < < 2'^ - 1, ij odd, 1 < j < d | , (8) 

the subspaces Wi, the space V„ of piecewise d-linear functions 
with mesh width h„ in each dimension, 

K= Wi, (9) 

leading to a full grid with (2" — 1)'' grid points, and to the inter- 
polant U{x) £ Vn, 

u{x) ^ ^ ai^iifi^iix). (10) 

Here and later on, we need the Zi-norm |Z|i = h ^^'^ 'he 

maximum-norm \l\oo = niaxi^j^d /j of multi-indices I. Figure 
[3] shows some 2-dimensional examples for the basis functions of 
the subspaces Wi, which correspond to anisotropic sub-grids with 
mesh- width hi^ in dimension j characterised by the multi-index I. 
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Figure 3. Basis functions of the subspaces Wi for |Z|oo ^ 3 in two dimen- 
sions. 

2.4 Sparse grids 

Starting from the hierarchical representation of Vn by the subspaces 
Wi, we can now select those subspaces that contribute most to the 
overall solution of the full-grid interpolation dlOl l. If the function 
we want to approximate meets certain smoothness conditions — the 
mixed second derivatives have to be bounded — this can be done a 
priori as we can derive bounds for the contributions of the different 
subspaces. We then obtain the sparse grid space 

V^^^= Wi, (11) 

|!|l^n+d-l 

leaving out those subspaces from the full grid space Vn with many 
basis functions of small support. (The exact choice of subspaces de- 
pends on the norm in which we measure the error; the result above 
is optimal for both the L2 norm and the maximum norm.) Note that 
in the one-dimensional case, the sparse grid space equals the full 
grid space. 

Figure |4] shows the selection of subspaces and the resulting 
sparse grid for n = 3, i.e. the sparse grid space V^^\ Com- 
pared to the full grid for the same discretisation level n (the full 
grid space V3 would also comprise the grey subspaces in Figure 
|4l(, this reduces the number of grid points (and therefore function 
evaluations and unknowns) significantly from 0{h~'^) — 0(2"'') 
to 0(h^^ iXoghn^)'^''^) - whereas the asymptotic accuracy de- 
teriorates only sUghtly fro m 0{h'i) to 0{hn{\oghn^Y^^), see 
iBungartz & Griebell ^2004) for detailed derivations. Figure[5]shows 
sparse grids in two and three dimensions for level n = Q each. 

Functions which do not meet the smoothness requirements 
or which show significantly differing characteristics (comprising 
steep regions as well as flat ones, e.g.) can be tackled as well, 
if adaptive refinement is used. The sparse grid structure i ll lb de- 
fines an a priori selection of grid points which is optimal if certain 
smoothness conditions are met and no further knowledge about the 
function in question is known or used. An adaptive (a posteriori) re- 
finement can additionally select which grid points in a sparse grid 
structure should be refined next, due to local error estimation, e.g. 
To refine a grid point, often all 2d children in the hierarchical struc- 




Figure 4. The two-dimensional subspaces Wi uptoi = 3(/i3 = 1/8) in 
each dimension. The optimal selection of subspaces (black) and the corre- 
sponding sparse grid on level n = 3 for the sparse grid space ^3'^'. The 
corresponding full grid of level 3 corresponds to the direct sum of all sub- 
spaces that are shown. 




Figure 5. Sparse grids in two and three dimensions for level n = 6. 



ture are added to the current grid, if they haven't been created yet. 
Note that it usually has to be ensured that all missing parents have 
to be created, as algorithms working on sparse grids depend on 
traversals of the hierarchical tree of basis functions. If additional 
knowledge about the problem at hand is available, it can be used 
in the criterion for adaptive refinement, allowing to better adapt to 
problem specific characteristics. 

2.5 Extension to functions that are non-zero on the boundary 

Up until now we have only considered functions that are zero on 
the domain's boundary 50.. To allow for non-zero values on the 
boundary, usually additional grid points located directly on 50. are 
introduced. For example, the one-dimensional basis on level one, 
containing only tp\,\(x), is extended by two basis functions with 
level and indices and 1 restricted to Q., namely ipofi(x) and 
tpo^\(x). They are then extended to the d-dimensional case as be- 
fore, with the exception that the new basis now contains basis func- 
tions on the modified level one with overlapping support. 

Apparently, this approach results in many more grid points 
(and therefore expensive function evaluations) than before. This 
shows quite nicely that it is not sufficient to just consider the 
asymptotic behaviour: asymptotically, nothing changes, but for 
very high dimensionalities we are not able to even start to inter- 
polate any more. In d dimensions, the basis for the subspace W\ 
for example contains already S'' basis functions, rather than a sin- 
gle one. Especially in settings where a very high accuracy close to 
the boundary is not required — which holds in our case — (or where 
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Figure 6. Modified one-dimensional basis functions ; : constant on level 
1 and "folded up" if adjacent to the boundary on all other levels. 



an adaptive selection of grid points is used in any case), it can be 
advantageous to omit the grid points on the boundary, and instead 
modify the basis functions to extrapolate towards the boundary of 
the domain. 

We modify the one-dimensional basis functions as follows: 
On level 1, we have only one degree of freedom; the best guess 
towards the boundary is to assume the same value, leading to a 
constant basis function. On all other levels, we extrapolate linearly 
towards the boundary, "folding up" the uttermost basis functions. 
All other basis functions remain unchanged, yielding 



with 

left / \ 



right / \ 



1 

ifi{x-2' -i) 



2-2' -x 




if ; = 1 A i = 1 , 

if ; > 1 A i = 1 , 

if Z > 1 A i = 2' - 1 , 
else , 



if s G 0, 



(12) 



else 



2' ■ a; + 1 - i if x- G [l - 
else 



(13) 
(14) 



Examples of the modified one-dimensional basis functions are 
shown in Figure[6] The d-dimensional basis functions are obtained 
as before via a tensor product of the one-dimensional ones. 



3 INTERPOLATION OF THE WMAP LIKELIHOOD 
SURFACE 

We now construct an interpolation of the WMAP log-likelihood, 
ln£, using sparse grids. In order to adapt the problem to our in- 
terpolation approach, we first use a 6-dimensional set of so-called 
normal parameters introduced in Sandvik et al. ( 2004), which are 
a transformation of the usual cosmological parameters such that 
the major axes of the Gaussian align with the coordinate axes. The 
logarithm of the likelihood is then well-approximated by a sum of 
one-dimensional parabolas in the different parameters, a fact that 
we will take advantage of by using the modified basis-functions 
described in ( 112b . For this set of normal parameters, we obtain 
an accurate interpolation already for a comparably low refinement 
level. This is shown for the 6-dimensional model as well as for a 
7-dimensional extension, using the running of the spectral index as 
an additional parameter. 



However, as a subsequent step we demonstrate that the param- 
eter transformation is not essential for obtaining a good interpola- 
tion. By investing more grid points, we obtain an accurate interpo- 
lation as well when using directly the 6- and 7-dimensional stan- 
dard parameter set, which is usually used in cosmological param- 
eter sampling. This approach shows the advantage of sparse grids 
of being rather generic. Furthermore, we are not restricted to the 
parameter range in which the transformation to normal parameters 
can be inverted. 



3.1 Choice of basis functions 

We use the modified basis functions as introduced in il2\ . which 
are well-suited for our problem. First, and as already indicated in 
Sec. (2] the region close to the domain's boundary is less impor- 
tant in our setting than the centre of SI: We will centre the domain 
of interest roughly at the maximum of the log-likelihood function 
In C and determine the boundary such that it includes the region 
with (In £max — ln£) < 25, which we will refer to as the 25 log- 
likelihood region (see Sec. 13. 31 ). 

Towards the boundaries of our intervals, the likelihood is then 
effectively zero and thus no great accuracy is needed in these re- 
gions. Therefore, we do not want to spend too much work on 50. 
Using the modified boundary functions, we extrapolate (d-linearly) 
towards S^l, see the discussion of the modified basis functions 
above. 

Second, the modifications are especially well-suited if the 
function to interpolate can be separated into a sum of one- 
dimensional functions. Assume that the likelihood C was a perfect 
product of one-dimensional Gaussians, 



£.{x) — c ■ e 



-ai(a:i-/Ji) 



-adi^d-IJ-d) 



(15) 



centred at {fii, . . . , fid)'^ . Then the interpolation of the log- 
likelihood In C reduces to d one-dimensional problems. 



with 



ln£(a;) = lnc + ^/fe(a;fc) , 



(16) 



(17) 



separating into a constant term plus a sum of functions that are 
constant in all directions but one. 

Keeping in mind that the one-dimensional basis function on 
level 1, ^1,1 (a;), is constant (cf. Figure |6ll, this simplifies the in- 
terpolation task. The d-dimensional basis function on level 1, 
<('i.i(a^), serves as an offset. (Only if (/ii, . . . , fid)'^ is the centre 
of fl, 01,1^51,1(3;) exactly expresses Inc.) Additionally, it is suffi- 
cient to spend only grid points on the main axes of the sparse grid 
(level 1 in all dimensions but one) to approximate the remaining 
one-dimensional contributions fk{xk) arbitrarily well: 

u{x) = 01,11(31, i(x) 




(18) 



Of course, C is not a perfect product of one-dimensional Gaus- 
sians; grid points that do not lie on the sparse grid's main axes ac- 
count for the additional mixed (correlated) terms of In £.. Given 
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that in sparse grids a large amount of points lie on the main axes, 
this mechanism works very well — the better, the less correlation 
between the different parameters exists. 

In order to take as much advantage as possible of the ef- 
fects described above, we introduce a parameter transformation 
in the following section, for which the new parameters are less 
correlated. However, the fact that the interpolation using the stan- 
dard parameters — which have much stronger correlations — works 
as well, spending just more grid points, will show that the sparse 
grid approach does not depend on this argumentation: Sparse grids 
can make use of such properties but do not rely on them. 



3.2 Normal parameters 

The set of cosmological parameters describing the ACDM model 
consists of the Hubble constant, h = km/'(sMpc) ' density 
parameter of vacuum energy, 0,^, the ones of baryons, Q,b, and of 
matter (baryonic -I- dark), Q.m, the optical depth to the last scatter- 
ing surface, r, the scalar spectral index of the primordial power 
spectrum, Us, and the scalar initial amplitude. As. We will re- 
fer to these parameters as cosmological parameters. For a more 
detailed description of the cosmological parameters, we refer to 
IColes & LuccMi] ( I2OO2I) . In the literature, there have been several 
attempts to transforming these parameters into a set of parameters 
that mirror the various physic al effects on the CMB power spec- 
trum (Hu et al. 2001; Kosowsk vetai]|2002h . In|chu et al. (2003), 
a set of parameters is provided in which the likelihood- surface of 
the CMB is well approximated by a multivariate Gaussian with the 
major axes aligned with the coordinate axes. In this work, we use 
the para meters given by | Sandvik et al.l j2004) . where the parame- 
ter set of IChu et al.l fcOolT is combined with the other parameter 
sets mentioned, in order to bring the major axes of the likelihood 
surface even closer to the coordinate axes. The new parameters are 
then {Qs, h,2, h3,t, A,, Z}, which we refer to as normal param- 
eters. When working with the latter, the logarithm of the likeli- 
hood is well-approximated by a sum of one-dimensional parabolas 
in the different parameters. The basis functions introduced above 
are therefore ideally adapted to this problem. In the following, we 
repeat the definitions of the normal parameters for convenience. 

The first parameter of our set is the angle subtended by the 
acoustic scale 



r.(ais) 180 
DA{ais) TV 



(19) 



where the index Is denotes the time of last scattering, DA(aiis) 
stands for the comoving angular diameter distance to the surface 
of last scattering (which we will come back to later), and (aia) is 
the comoving sound horizon at last scattering. 



rs(fflis) 



a{t) 



dt. 



(20) 



Here, Cs (t) denotes the sound speed for the baryon-photon-fluid at 
time t, which is well approximated by 



c.W^-i(l + 3^)-V 
3 



(21) 



with the index b standing for baryons and the index 7 for photons. 
Using the Friedmann equations and ignoring the vacuum energy 
at last scattering, rs(ais) can be shown to be cSandvik et aUi2004 : 



iKosowskv et alj2002l) 

, , 2^/3 



• ais , V 1 + -Ris + VRis + risRis 
■ In — 



1 + Vns-Ris 



where 

Ri. = 



- — - — r = 30 —- 

ais Vl03/ 



Pr(aiB) 
Pm{A,) 



= 0.042 TO„ 



(111. 
V103 



(22) 

(23) 
(24) 



The index r stands for radiation, i.e., pr consists of the sum of 
photon and neutrino energy densities, and the index m is used for 
matter (baryons -1- dark matter). We define Wm = ^mh^ in the 
same way as Wb = Qbh .There dshift at last scattering, zis, is well 
approximated by fau et al.ll200ll) 



= 1048 (1 + 0.00124 



<°-^^«)(l + 5i»^ 



gi = 0.0783iUi7°-^^*(l + 39.5w°-^^^)"^ 



(25) 
(26) 

52 = 0.560(1 + 21. 1^J;^*^)"^ (27) 

As already mentioned, Da{ciib) in l |19t denotes the comoving an- 
gular diameter distance to the surface of last scattering and is given 
by 



-DA(ais) 



Ho Ja, V^Aa'^ + flma + Q, 



-.da . 



(28) 



The second and third parameters in our set are the ratios of 
the second a nd the thi rd peak to the first peak in the Cf spectrum 
of t he CMB Jhu et al.li2001) . where the tilt-dependence is factored 
out jPage et al.ll2003l) . 



hi = 0.0264 w^" '"' 

exp (-0.476 [ln(25.5w6 + l.i 

2N 



/13 = 2.17 1 



\ 0.044/ 



1 + 1.63 1 



Wb \ 

I Wri 



0.071 



(29) 



(30) 



We use the tilt parameter given bv lSandvik et al. 



slightly modified version of the one in lChu et al 
minimise the correlation with Wb'. 



( 20041) . which is a 
20031) in order to 



t ; 



Wb 



.0.024 J 
The amplitude parameter is 

A 

A^ 



\ -0.5233 

1 2"= 



2.95 X 10- 



(31) 



(32) 



where kp = 0.05Mpc~^ denotes the pivot point. The normalisa- 
tion factor of 2.95 x 10~® comes in because we use the scalar 
amplitude As of CMBEASY, which is defined as the primordial 
power of the curvature fluctuations evaluated at the pivot point. 
As = A'ji{kp). It is related to the scalar amplitude As of CMB- 
FAST, which is used in lSandvik et al.l ( l2004h . by As = -y 

II I .2.y5xiu 
Verde et al.l2003h . Finally, we use the physical damping due to 

the optical depth to last scattering as our last parameter: 

Z = e-^. (33) 

In order to construct the interpolation of the likelihood surface, 
we need the transformation that maps the normal parameters back 
onto cosmological parameters. The reason for this is the way we 
construct the interpolation: Our sparse grid algorithm chooses the 
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normal parameters where it wants to refine the grid, which we then 
need to transform into cosmological parameters to run CMBEASY 
and the WMAP-likelihood code. Our technique of inverting the pa- 
rameter transformation is presented in appendix lAl 

3.3 Generation of test set and choice of interpolation range 

For choosing the parameter range in which to construct the inter- 
polation, we have run MCMCs containing about 50,000 points at 
a temperature of T = 3. That is, in the Metropolis algorithm we 
choose the transition probability a{x,y) from a point x in the 
chain to anew point yX.ohta{x,y) = min | {C{y) / C{x))~ , l|. 
Using this transition probability with T = 1 results in the usual 
Metropolis algorithm, whereas choosing T — 3 allows us to ex- 
plore a larger parameter range than with the regular algorithm. 
These chains covered a region reaching out to about 25 log- 
likelihoods around the peak. 

The optical depth to the last scattering surface, r, which can 
be determined from the CMB polarization, is not well-constrained 
by the WMAP polarization data due to their low signal-to-noise 
ratio. Therefore, when running the MCMCs at T = 3, we had to 
restrict r to the physically meaningful range r ^ 0. This restriction 
corresponds to Z ^ 1 for the normal parameters. In the case of the 
normal parameters in 7 dimensions, we had to additionally restrict 
the intervals to /12 ^ 0.52 and /13 ^ 0.38, which is the range in 
which the parameter transformation is invertible. Furthermore, we 
chose to restrict our set of points to be within the 25 log-likelihood 
region around the peak. 

In order to roughly centre our intervals at the maximum of 
the log-likelihood function, we have determined the latter using 
a few runs of a simple simplex search^ The interval boundaries 
were then defined as the box centred at the maximum which con- 
tains all points of the above-described chains. Note that it is not 
important for the accuracy of the interpolation that the intervals are 
well-centred at the maximum. Note further that we have used this 
set of points as a test set for comparing our interpolation with the 
real log-likelihood. 

3.4 Results 

We have interpolated the log-likelihood of the WMAP 5 year data 
in the 6-dimensional normal parameter space described in Sec. 13. 21 
The same has been done for a 7-dimensional model, in which we 
have chosen the running of the spectral index of the primordial 
power spectrum, a, as an additional parameter. Constructing the 
interpolation can be parallelised to an arbitrary degree, according 
to the available computational resources. 

For the 6-dimensional model, we plot the absolute error of 
the log-likelihood, (u — ln£), against the negative WMAP log- 
likelihood, (— ln£), for the points in the test set in Figure|7] We 
have used an interpolation on a sparse grid of level n — 5 (con- 
sisting of 2561 grid points) in the top panel, and of level n — 6 
(consisting of 10625 grid points) in the bottom panel. One clearly 
sees the improvement in accuracy when increasing the grid level 

We were running several simplex searches and chose the result with the 
highest value of the log-likelihood. The runs did not all converge to ex- 
actly the same point, which we think was due to numerical issues (the log- 
likelihood was presumably not completely convex, which could be due to 
the dips we will mention in Sec. [53). 
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Figure 7. Absolute error of the inteipolation with respect to the real log- 
likelihood in 6 dimensions for an interpolation with a sparse grid of level 5 
(top panel) and of level 6 (bottom panel) for normal parameters. 
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Figure 8. Absolute error of the inteipolation with respect to the real log- 
likelihood in 7 dimensions for an interpolation with a sparse grid of level 6 
(top panel) and of level 7 (bottom panel) for normal parameters. 
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from 71 = 5 to n = 6. Figure [8] shows the same plot for the 7- 
dimensional model, for grid level n = 6 (18943 grid points) in 
the top panel and n — 7 (78079 grid points) in the bottom panel. 
We again see the improvement in accuracy with increasing refine- 
ment level. However, the additional parameter a is quite strongly 
correlated with many of the other parameters, whereas the correla- 
tions between the normal parameters in 6 dimensions are reduced 
to a minimum. We therefore have to increase the grid level by one 
in 7 dimensions, in order to obtain results comparable to the 6- 
dimensional ones. In both figures, we note a systematic negative 
offset of the interpolation with respect to the real function, which 
becomes less severe for the higher refinement levels. This offset is 
due to the fact that we construct a d-linear interpolant of a convex 
function, which systematically lies below the function. This could 
be easily coped with by adding a small offset to Ofi^i after the inter- 
polation, or, even better, by using piecewise polynomial instead of 
the piecewise linear basis functions. We leave the usage of piece- 
wise polynomial basis functions, which promise to be well-adapted 
to the log-likelihood, for future work. 

Note that we have restricted the plot range to [-2,2], because 
only 0. 1 per cent or less of the points lie outside this rangeQ Al- 
most all of these points lie in the corners of Q due to relatively 
strongly correlated parameters. These are the regions in parameter 
space where the 25 log-likelihood range around the peak extends to 
the interval boundaries. Due to the extrapolation we use close to the 
boundaries (cf. the end of Sec.O, we obtain relatively large uncer- 
tainties in those regions, which do not affect the one-dimensional 
projections of the likelihood function, though. The uncertainties 
can be further reduced, spending (adaptively) more grid points in 
those regions, see also the discussion about adaptivity in Sec. 13. 51 

For the 6-dimensional interpolation with a sparse grid of level 
6, 2.5 per cent of the test points have an absolute error > 0.25 in the 
log-likelihood, and 0.03 per cent of the test points have an absolute 
error > 1. In 7 dimensions and for refinement level 7, the corre- 
sponding numbers are 9 per cent and 0.5 per cent, respectively. This 
is a h igher level of accuracy as reached by Pico jFendt & Wandeltl 
l2007h . for which about 90 per cent of the points in a region reaching 
out to 25 log-likelihoods around the peak have been calculated with 
an absolute error below 0.25. However, we note that these numbers 
for Pico are valid for a 9-dimensional parameter space, whereas 
we work in 6- and 7-dimensional spaces and leave the extension to 
higher-dimensional models to future work. But we also note that 
in all settings where a systematic offset in the interpolation error 
can be observed, it is sufficient to reduce the offset to improve our 
results significantly, in particular for interpolations on lower levels 
(see, e.g., the scatterplot for the 6-dimensional model and grid level 
5, Figure |7). 

We have projected both the interpolation and the WMAP like- 
lihood function using MCMCs of about 150,000 points, and com- 
pare the results for the 7-dimensional model for grid level n = 6 in 
Figure |9] We reproduce the projected one-dimensional likelihood 
curves almost perfectly. The results for the 6-dimensional model 
for grid level n — 5 are very similar to the 7-dimensional ones. 
Note that we obtain these excellent results for the 6-dimensional 
model using only 2561 grid points. The visual comparison of our 
results with the proje cted one-dimensional likelihoods obtained by 
CosmoNet jAuld et al.. 2007. ,2008.) shows that we reproduce the 




a 



Figure 9. Compaiison of the one-dimensional projections of the 7- 
dimensional WMAP 5 year likelihood (solid) and its interpolation (dashed) 
using a sparse grid of level n = 6 (consisting of 18943 grid points) for 
normal parameters. The curves match almost perfectly. 



original curves with a higher accuracy than the latter. Note also 
that our interpolation is constructed in a rather wide region, en- 
compassing abo ut 25 log-likelihoods around the peak, whereas in 
lAuld e t al. ( 2008) the region in which In £ was fitted covers only 
4a around the peak for the combined likelihood of CMB and LSS. 
This corresponds to a region of about 8 log-likelihoods around the 
peak for the combined likelihood, and even less when using only 
the CMB likelihood. 

Consider now the interpolation of the WMAP likelihood 
surface using directly the standard parameters, which are used 
by default when doing cosmological p arameter sampli n g wit h 
the MCMC driver from CMBEASY jPoran & MiiUeil l2004h : 
{wm, Wb, h, T, Tis, ln(10^°As) — 2r}, to which we again add a as 
an additional parameter in the 7-dimensional case. Working with 
these parameters has the advantage that we do not have to restrict 
ourselves to the parameter range in which the parameter transfor- 
mation is invertible. However, the problem is now less adapted to 
our choice of basis functions, due to the stronger correlations be- 
tween the different parameters. We therefore pay the price of hav- 
ing to increase the grid level by one in this case in order to reach an 
accuracy as good as before. We show the absolute error of the log- 
likelihood, (u — In £), against the negative WMAP log-likelihood, 
(— ln£), for the 6-dimensional model for grid level n — 6 (10625 
points) and n — 7 (40193 points) in Figure [ToFI and for the 7- 
dimensional one for n = 7 (87079 points) and n — 8 (297727 



In 6 dimensions, the number of points outside this range is 0.02 per cent 
(0.003 per cent) for n = 5 (n = 6); in 7 dimensions, it is about 0. 1 per cent 
for both grid levels. 



^ Here, about 0.3 per cent (0.2 per cent) of the points in the test set lie 
outside the chosen plot-range for the grid of level n = 6 (n = 7). 
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Figure 10. Absolute error of the interpolation with respect to the real log- 
likelihood in 6 dimensions for an interpolation with a sparse grid of level 6 
(top panel) and of level 7 (bottom panel) for standard parameters. 
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Figure 11. Absolute eiTor of the inteipolation with respect to the real log- 
likelihood in 7 dimensions for an interpolation with a sparse grid of level 7 
(top panel) and of level 8 (bottom panel) for standard parameters. 



points) in Figure For the 6-dimensional (7-dimensional) inter- 
polation with a sparse grid of level n — 7 (n = 8), the fraction 
of test points with absolute error > 0.25 in the log-likelihood is 
6 per cent (20 per cent), and 0.5 per cent (2.5 per cent) for an ab- 
solute error > 1. Again, the one-dimensional projections of the 
6-dimensional case for level n — 6 and of the 7-dimensional case 
for level n = 7 are very similar to the ones in Figure[9l and are thus 
not shown. 

We have tested the evaluation time of our interpolation by 
evaluating a sparse grid interpolant of level 6 in 6 dimensions for 
2,000,000 points randomly chosen from within Q. On a conven- 
tional desktop computer (Intel chipset, 2.8 GHz), this took about 
92 /is per point, including the random generation of the point. In 7 
dimensions on the same level we have twice as many grid points 
and one dimension more, which doubles the evaluation time to 
189 us. For CosmoNet and Pico, the evaluation of a 6-dimensional 
model is specifie d to take about 10 jis and 250 fis, respectively 
jAuld et al.ll2008l) . Note that we do not know on which hardware 
the evaluation times of CosmoNet and Pico have been measured, 
which makes a comparison hardly possible. Note further that our 
code to evaluate a sparse grid function is not optimised for fast 
evaluation times and that there is still room for improvement. In 
any case, for all of these codes the bottleneck in cosmological pa- 
rameter sampling is now the MCMC algorithm itself rather than the 
evaluation of the likelihood, at le ast with the MCMC driver used in 
this work dPoran & Mulleill2004) . 

Note that in 7 dimensions, we need significantly more grid 
points than in 6 dimensions, since the additional parameter a 

^ About 1 per cent of the points in the test set lie outside the chosen plot- 
range for the grid of both level n = 7 and level n = 8. 



strongly correlates with the other parameters, and we thus need to 
increase the grid level by one to obtain good results. As the stor- 
age requirements are rather low, this mainly increases the number 
of evaluations that are needed for constructing the sparse grid in- 
terpolation. As already stated before, though, the construction of 
the interpolation can be parallelised to an arbitrary degree, accord- 
ing to available computational resources, so that this should not be 
an issue. To store the interpolant for a regular sparse grid in d di- 
mensions for level I with grid points, we would only need A'' 
doubles for the coefficients and two integers to remember both d 
and leading to (TV + 1) ■ 8 Bytes. For adaptively refined sparse 
grids, we additionally have to store at least which grid points have 
been refined, requiring slightly more storage. For current hardware 
architectures, the size of the memory is therefore not a limiting fac- 
tor for our application. 

We have shown in this section, that the interpolation of the log- 
likelihood with regular sparse grids provides excellent and compet- 
itive results in both 6 and 7 dimensions, even when interpolating in 
the rather wide range covering about 25 log-likelihoods around the 
peak. The memory requirements to store the sparse grid are rather 
low and the evaluation times are very fast. We have demonstrated 
that the one-dimensional projections of the interpolants match the 
original ones almost perfectly, and that the scatterplots of the errors 
exhibit only a small fraction of grid points with higher errors. For 
standard parameters, more grid points have to be spent to obtain the 
same accuracy than for normal parameters, as the latter ones corre- 
late less. In the next section, we will therefore focus on an adaptive 
extension of the method used so far to further reduce the number of 
grid points that are required. 
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3.5 Improvements with adaptive sparse grids 



ccjv+i is changed by 



As it has already been mentioned, the log-hkelihood is not a perfect 
sum of one-dimensional functions. The different parameters con- 
tribute differently to C and correlate more or less with each other. It 
is therefore reasonable, especially when using the standard param- 
eters which correlate more, to employ adaptivity, spending more 
grid points in critical regions and less grid points elsewhere. In this 
section, we demonstrate the utility of adaptivity by showing some 
first results as a proof of concept. As they can clearly be improved 
further, we leave a thorough study of adaptive sparse grids for the 
interpolation of In C to future work. We start by specifying how to 
refine, we formally derive a suitable criterion that can be used to 
specify where to refine as well as to measure the quality of an in- 
terpolation, and we finally provide results that show how adaptivity 
can improve the interpolation obtained for regular sparse grids. 

Employing adaptivity, one can attempt to either obtain better 
results fixing roughly the number of grid points used, or to achieve 
a similar accuracy using less grid points. In the following, we show 
results for the former, tackling the 7-dimensional example using 
the standard parameters on level 7 with 78079 grid points presented 
above. We start with a regular sparse grid of some low level and re- 
fine grid points, creating all 2d children in the hierarchical structure 
(if possible) each, until the grid size exceeds 78000 grid points. In 
settings where the contributions of the dimensionalities differ sig- 
nificantly, it can be useful to start with level 2 to allow dimensional 
adaptivity, neglecting unimportant dimensions; here, the grid points 
on low levels will be created in any case, so we can start with a 
sparse grid on level 5, e.g., to save on the number of adaptive steps. 

Choosing a suitable refinement criterion, it can be determined 
whether to refine in a broad way (close to regular sparse grids) or 
in a more greedy way in the sparse grid's hierarchical tree struc- 
ture. It is reasonable to take the surpluses of the grid points into ac- 
count as they contain the local information about the functions, i.e., 
if the function has a high gradient locally. Furthermore, they de- 
cay quickly with increasing level-sum in the convergent phase. The 
mere surplus-based criterion, refining the grid points with the high- 
est absolute value of the surplus first, is known to tend to minimise 
the L2-norm of the error. As we do not spend grid points on the do- 
main's boundary, and as we are extrapolating towards the boundary, 
the biggest surpluses per level can be found for the modified basis 
functions which are adjacent to the boundary. A mere surplus-based 
criterion will therefore only refine towards the boundary. This re- 
duces the error especially for sampling points with a high error in 
the scatterplots, as they are located towards the boundary of the 
domain. 

In the following, we theoretically derive a refinement crite- 
rion which is better suited to our problem than the purely surplus- 
based one. In order to maximise the information our interpola- 
tion contains about the real likelihood, we attempt to minimise the 
KuUback-Leibler distance oLkl between the interpolation and the 
likelihood function, 



dKL 



d'^x C{x) In 



C{x) 



exp(M(a;)) 
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d X C{x) (ln£(a;) — '"(a;)) 



(34) 



which is defined for two normalised probability distributions C and 
exp(u). Let us now derive the refinement criterion we obtain from 
minimising duL- Assume that we have already computed an in- 
terpolation u{x) with N grid points, then the Kullback-Leibler 
distance dKL when evaluating the function at an additional point 
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(35) 



If we refine the interpolation around the grid point that contributed 
most to the Kullback-Leibler distance, we can hope to converge to- 
wards the minimum of dKL fastest. In order to obtain a suitable 
refinement criterion, we have to simplify the formula in l l35t con- 
siderably. We thus assume the likelihood C{x) as well as the basis 
function ipN+\(x) to be locally constant on (/3jv+i's support, ob- 
taining 



diicw 
I KL 



jold 
" 0-KL 



Vjv+i'C(a;jv+i) lajv+i] 



(36) 



where Vjv+i is the volume covered by the basis function tpN+\ 
(i.e. its support), and we have used ipN+i{xN+i) ~ 1. 

With l |36t , we have derived an estimation of the contribution of 
a basis function to dKL, which is a reasonable refinement criterion 
in our setting. In addition to the surplus of the grid point, | qjv-|- 1 1 , it 
takes into account the value of the likelihood £{xn+i) at the grid 
point, and the volume of the basis function Vn+i- The likelihood 
takes care of the fact that we would like to be more accurate where 
the likelihood is higher. The regions of very low likelihood are less 
interesting for us — the likelihood being already very close to zero 
beyond a difference of about 20 log-likelihoods. The volume factor, 
on the other hand, prevents the interpolation to refine too deeply 
(to very high grid levels) locally in the parameter space. However, 
since this usually only takes effect after several refinement steps, 
and as we have restricted the number of grid points, we choose not 
to include the volume factor but rather to refine several points at 
the same time, which addresses this issue in an alternative way, and 
which will be discussed later on. We further choose to introduce a 
temperature T again, which allows us to weight the likelihood with 
respect to the surplus and thus to influence how much to refine close 
to the maximum. The refinement criterion we used in this study is 
thus 



C{xi^i] 



l/T 



\ai. 



(37) 



where we have divided the likelihood by its peak value, £max, 
(which we have already obtained determining the interpolation do- 
main) because the WMAP code returns the log-likelihood only up 
to a constant offset, so that we do not know the correct normalisa- 
tion of jC. For T = 1, refinement takes place only very close to the 
maximum as C decays quickly; a temperature of T = 6 showed to 
provide good results within the whole domain of interest. 

Refining only one grid point per refinement step often causes 
adaptivity to get stuck in a single, special characteristic of the func- 
tion. Interpolating In £ with our choice of basis functions, all grid 
points are likely to be created only in the direction where the log- 
likelihood decays fastest, or around one of the local dips we will 
address later on. Refining more than one grid point at the same time 
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Figure 12. Absolute error of the interpolation with respect to the real log- 
likelihood in 7 dimensions for an interpolation with an adaptively refined 
sparse grid for standard parameters. 

helps to circumvent such effects, resulting in a broader refinement 
scheme. 

The Kullback-Leibler distance duL can also be used to mea- 
sure the quality of our interpolation: The distance between the real 
likelihood and or interpolation should be as small as possible. How- 
ever, as we have already mentioned above, we do not know the nor- 
malisation of the WMAP likelihood function. Therefore, cLkl is 
not necessarily positive and thus looses its property of being a use- 
ful measure of the 'closeness' of the two functions. We thus use a 
slightly modified version, 

dKL = J d'^x C{x)\\nC{x) — u{x)\ , (38) 

as a measure of the quality of our interpolation, instead of the ac- 
tual Kullback-Leibler distance. It can be easily calculated from an 
MCMC with T = 1 obtained for jC, by simply averaging the ab- 
solute errors | ln£(a;i) — ii(a;i)| over all points in the chains. Fur- 
thermore, we quote this value averaged over a chain of T = 3 
(exploiting the interpolation domain better), which corresponds to 
J d-'x £.(x)T'\ln£.(x) - u{x)\. 

Figure [12] shows the scatterplot for an adaptively refined 
sparse grid in 7 dimensions. Starting from a regular grid of level 
5, we refined 100 grid points each according to the refinement cri- 
terion ([37} with r = 6. Needing only about as much grid points 
(78551) as for the regular sparse grid of level 7, Tab. [T] shows that 
we obtain results which are close to those of a regular sparse grid 
of level 8 with almost 4 times as many grid points. We provide the 
Mean Squared Error (MSE) as well as dxL for both T — 1 and 
T = 3 chains for regular sparse grids of level 7 and 8, and for 
the adaptively refined case. We also quote how many points exhibit 
an absolute error larger than 1 or 0.25 for the T — 3 chains. We 
do not show the histograms of the adaptively refined model, as the 
histograms for both the regular grid on level 8 and the adaptively 
refined one, are very close to the histograms for normal parameters 
in 7 dimensions shown in Figure[9| 

We would like to mention, that, due to numerical problems, 
the current version of CMBEASY produces local, unphysical and 
sometimes rather high dips. This problem is already known and 
will be corrected in the next release. For stochastic approaches, this 
is not a big problem, though: The dips are local and just cause some 
noisy evaluations. But it poses a problem for numerical approaches 
if a grid point hits a dip. Then it can happen, that spending more 
grid points can even deteriorate the results. For our regular grid in 6 
dimensions using the standard parameters, e.g., increasing the level 
from 7 to 8 caused a higher overall error on the chain-data used 



for the histograms, as especially two new basis functions close to 
the peak caused an error of up to 12 of the log-likelihood for all 
evaluations affected by those basis functions. 

Fortunately, dips can be detected automatically due to the hi- 
erarchical structure of the sparse grid and the smoothness of In C, 
using a criterion that is once more based on the surpluses. Further- 
more, it is not a severe problem when using adaptivity, as adaptivity 
localises the effects of the dips automatically. One just has to take 
care not to spend too much grid points to compensate for the dips. 

The first adaptive results are promising, but there is still room 
for a lot of improvement. Even better refinement criteria than 
those used so far could be employed. Using not only piecewise 
linear functions, but rather piecewise polynomials, and applying 
adaptivity in both the mesh-width and the polynomial degree is 
very promising; especially the extrapolation properties towards the 
boundary would be improved, and less grid points would be needed 
to obtain the same accuracies. 



4 CONCLUSIONS 

In this work, we have explored the utility of interpolating the 
WMAP log-likelihood surface using sparse grids. We demonstrated 
that the results are excellent and competitive to other approaches 
regarding speed and accuracy, and we discussed advantages over 
fitting the likelihood surf ace with polynomial s (.Fendt & Wandelj 
l2007i : ISandvik et alj2004) or neural networks jAuld et alj2008l) : 

The interpolation based on sparse grids converges towards the 
exact function in the limit of the grid level going to infinity. We 
can therefore reach an arbitrary accuracy by simply increasing the 
amount of work we spend. In the case of a polynomial fit, this is not 
guaranteed since increasing the polynomial degree runs the risk of 
becoming unstable. 

In order to construct the sparse grid interpolation, we do not 
need to sample a set of training points using MCMCs beforehand, 
since the sampling points are determined by the sparse grid struc- 
ture which is given a priori. Once we have chosen the volume of 
interest, the time for constructing the interpolation is dominated by 
the evaluation of the likelihood function at the grid poi nts. We do 
not ne ed additional training time as for neural networks ( lAuld et aEI 
|2008|) . for example. Constructing the interpolation can thus be done 
almost arbitrarily in parallel, only limited by the computational re- 
sources that are available. 

The sparse grid technique is rather general and not restricted 
to certain classes of functions. In particular, the choice of sampling 
points and basis functions is not tailored to a single problem as 
for neural networks, where the topology of the network has to be 
chosen problem-specifically (often in a heuristic way). The sparse 
grid interpolation technique as well as our extensions can therefore 
be readily applied to other problems in astrophysics and cosmology, 
and will be useful in further tasks, where an accurate interpolation 
of a function is needed. 

The excellent performance of the sparse grid interpolation can 
be further improved, leaving future research to do: It can be applied 
to models with more than seven parameters by spending more com- 
putational effort. Further modification of the basis functions, for ex- 
ample allowing for a piecewise polynomial interpolation, promises 
better convergence rates and higher accuracies. Adaptive refine- 
ment schemes, which take into account the characteristics of the 
interpolated function, can be used to further increase the accuracy 
of the interpolation, as we have already demonstrated for a first ex- 
ample in this work. 
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err > 1, T=3 


err > 0.25, T=3 


MSB, T=l 


MSB, T=3 




duL, T=3 


level 7 


4.2% 


50.5% 


0.087 


0.532 


0.256 


0.354 


level 8 


2.3% 


19.3% 


0.017 


0.210 


0.091 


0.193 


adaptive 


1.8% 


23.6% 


0.027 


0.202 


0.110 


0.204 



Table 1. Comparison of errors of regular sparse grids of level 7 and level 8, respectively, and an adaptively refined sparse grid using approximately as many 
grid points as contained in the regular grid of level 7. Shown are the number of points with an absolute error larger than 1 or 0.25 in the T = 3 chains, the 
MSB for chains of T = 1 and T = 3, and dxL^ which denotes the absolute value of the error averaged over chains of T = 1 and T = 3. 
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APPENDIX A: INVERSION OF THE PARAMETER 
TRANSFORMATION 

In the following, we present a technique of inverting the parameter 
transformation of Sec. l3.2l to compute the cosmological parameters 
given the normal parameters. The normal parameter /12 in terms of 
cosmological parameters is given by 

h2{w^,Wi) = 0.0264 m,""-^''' 

exp (-0.476 [ln(25.5u>6 + 1.84 . (Al) 



We solve this equation for Wm as a first step: 



exp < ± 



-25.5 Wh 



0.476 



In 



0.0264 



0.762 



1 1/2' 



(A2) 



Inconveniently, there exist two different solutions for 'Wm{h2, Wb), 
which complicates the inversion. We now substitute Wm in 
h3{'Wm,Wb) OOt for l lA2t and thus obtain hz(h2,'Wb), which, of 
course, has two solutions as well. An example of the two branches 
of h-i{h2, Wb) for h2 = 0.45 is depicted in Fig. lAll We can com- 
pute the critical point where only one solution exists using the con- 
dition 



0.476 



■In 



h2 



0.0264 



0.762 
Wb 



= 0, 



(A3) 
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Figure Al. The two branches of versus ui^ for h2 = 0.45. 



as can be seen from ( IA2b . This condition gives us tiie following 
formulae for the parameter values at the critical point: 

m,crit(/l2) = ( 1 , (A4) 

Wm,ciit{h2) = (1 — 25.5 W6,crit) , (A5) 

l.o4 

ft3.cHt(/l2) = 2.17 



(l + 1.63 (l - ^^^) i«„,crit) . (A6) 



The two parameters h2 and h-j, can now be inverted to Wm and Wb- 
For a given /12, we express /13 in terms of h2 and wi,, as described 
above. We then use ft3,crit('i2) to choose the upper branch of 
hs,{h2,'Wb) if our given is bigger than fea.crit (/i2), and the lower 
branch if it is smaller. Using the respective branch of hz(h2,Wb), 
we search numerically in Wb until hz{h2,Wb) matches the given 
/i3. Substituting that value of Wb into equation JA2I (. we readily 
obtain the value for Wm- 

Now it is straightforward to compute the values for Ua and 
As from t an d To obtain h from 0^, we follow the procedure 
suggested by iKosowskv et alj j2002l) . expressing Qs in terms of 
h in terms of h and then searching in h numerically until Qs{h) 
matches the given value of Qs. 
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